\subsection{Shingles}
N-Shingle is a Fixed sized sequence of N sequential "words".
an example can be 4-Shingle of the sentence "only look at the bright side of life" will be turn into "only look at the", "look at the bright", "at the bright side", "the bright side of" and "bright side of life".

\subsection{Jaccard Similarity}
For seeing if two pages are similar. Jaccard Similarity can be used. it will look at the set of shingles of two documents $A$ $B$

the formal will be: $Jaccard(A,B)= \frac{|A \bigcap B|}{|A \bigcup B|}$.
But since Jaccard Similarity can be it can be expense it is smart to try and reduces the calculation time. One way is by hashing the shingles since it is faster to is if two numbers are the same then two strings. This might result a few errors but the amount should be within reasonable tolerance.

we have not used Jaccard Similarity in our project.